Evaluation and collection of proper name pronunciations online
نویسندگان
چکیده
Objective evaluation allows a model to be compared with other similar models. However, automatic pronunciation models should also be extensively evaluated by humans, since the ultimate goal of any pronunciation model is to produce an accurate pronunciation as judged by most people. This paper describes an initiative to evaluate and collect proper name pronunciations online, the development of the US Pronunciation of Proper Names Site (www.pronounce-names.org), and the results obtained so far. The internet, through our web-based interface, has already proven to be a very successful medium both in terms of number of evaluations and in terms of data collection. In 5 weeks, it has brought to our site 601 users, which have evaluated 477 names and corrected 281 pronunciations. The information gathered is useful to improve our pronunciation models, as well as to (automatically) correct the pronunciations in the CMU dictionary.
منابع مشابه
Comparative objective and subjective evaluation of three data-driven techniques for proper name pronunciation
Automatic pronunciation of unknown words is a hard problem of great importance in speech technology. Proper names constitute an especially difficult class of words to pronounce because of their low frequency of occurrence and variable origin. In this paper, we compare three different data-driven approaches which use a dictionary of (known) proper names to infer pronunciations for unknown names,...
متن کاملLearning linguistically valid pronunciations from acoustic data
We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how “linguistically reasonable” the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of w...
متن کاملImproving Proper Name Recognition by Adding Automatically Learned Pronunciation Variants to the Lexicon
This paper deals with the task of large vocabulary proper name recognition. In order to accomodate a wide diversity of possible name pronunciations (due to non-native name origins or speaker tongues) a multilingual acoustic model is combined with a lexicon comprising 3 grapheme-to-phoneme (G2P) transcriptions (from G2P transcribers for 3 different languages) and up to 4 so-called phoneme-tophon...
متن کاملLearning Linguistically Valid Pronun
We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how “linguistically reasonable” the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of w...
متن کاملWord Pronunciation Disambiguation using the Web
This paper proposes an automatic method of reading proper names with multiple pronunciations. First, the method obtains Web pages that include both the proper name and its pronunciation. Second, the method feeds them to the learner for classification. The current accuracy is around 90% for open data.
متن کامل